Dataset statistics
| Number of variables | 10 |
|---|---|
| Number of observations | 1000 |
| Missing cells | 17 |
| Missing cells (%) | 0.2% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 78.2 KiB |
| Average record size in memory | 80.1 B |
Variable types
| Categorical | 4 |
|---|---|
| Numeric | 6 |
feat.e is highly overall correlated with feat.i | High correlation |
feat.f is highly overall correlated with response | High correlation |
feat.i is highly overall correlated with feat.e | High correlation |
response is highly overall correlated with feat.f | High correlation |
feat.a has unique values | Unique |
feat.e has unique values | Unique |
feat.f has unique values | Unique |
feat.h has unique values | Unique |
feat.i has unique values | Unique |
Reproduction
| Analysis started | 2022-11-23 20:34:29.326080 |
|---|---|
| Analysis finished | 2022-11-23 20:34:44.529022 |
| Duration | 15.2 seconds |
| Software version | pandas-profiling vv3.5.0 |
| Download configuration | config.json |
response
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.9 KiB |
| 1 | |
|---|---|
| 0 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 1000 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 553 | |
| 0 | 447 |
Length
Histogram of lengths of the category
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1 | 553 | |
| 0 | 447 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 553 | |
| 0 | 447 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 1000 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 553 | |
| 0 | 447 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 1000 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1 | 553 | |
| 0 | 447 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1000 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 553 | |
| 0 | 447 |
feat.a
Real number (ℝ)
| Distinct | 1000 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.0483836 |
| Minimum | -7.429324 |
|---|---|
| Maximum | 10.72312 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 353 |
| Negative (%) | 35.3% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | -7.429324 |
|---|---|
| 5-th percentile | -3.8677529 |
| Q1 | -0.88497273 |
| median | 1.0276289 |
| Q3 | 2.9938056 |
| 95-th percentile | 6.0284016 |
| Maximum | 10.72312 |
| Range | 18.152444 |
| Interquartile range (IQR) | 3.8787783 |
Descriptive statistics
| Standard deviation | 2.9750849 |
|---|---|
| Coefficient of variation (CV) | 2.8377828 |
| Kurtosis | -0.068601967 |
| Mean | 1.0483836 |
| Median Absolute Deviation (MAD) | 1.950913 |
| Skewness | 0.065392043 |
| Sum | 1048.3836 |
| Variance | 8.8511303 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| -0.6814269397 | 1 | 0.1% |
| 0.1177140185 | 1 | 0.1% |
| 3.307156885 | 1 | 0.1% |
| 1.362157988 | 1 | 0.1% |
| 3.590945302 | 1 | 0.1% |
| 5.141543585 | 1 | 0.1% |
| 6.898744046 | 1 | 0.1% |
| 0.9148148358 | 1 | 0.1% |
| -5.747153271 | 1 | 0.1% |
| 1.094578014 | 1 | 0.1% |
| Other values (990) | 990 |
| Value | Count | Frequency (%) |
| -7.429324037 | 1 | |
| -6.982768395 | 1 | |
| -6.929446856 | 1 | |
| -6.805099011 | 1 | |
| -6.523753407 | 1 | |
| -6.397694581 | 1 | |
| -5.927506627 | 1 | |
| -5.747153271 | 1 | |
| -5.674963089 | 1 | |
| -5.631899332 | 1 |
| Value | Count | Frequency (%) |
| 10.7231198 | 1 | |
| 9.07514201 | 1 | |
| 9.054576998 | 1 | |
| 8.726349291 | 1 | |
| 8.714374438 | 1 | |
| 8.65907834 | 1 | |
| 8.463993632 | 1 | |
| 8.374181476 | 1 | |
| 8.290679957 | 1 | |
| 8.250320061 | 1 |
feat.b
Real number (ℝ)
| Distinct | 992 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 8 |
| Missing (%) | 0.8% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -3.9415151 |
| Minimum | -8.5717913 |
|---|---|
| Maximum | 1.0855562 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 987 |
| Negative (%) | 98.7% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | -8.5717913 |
|---|---|
| 5-th percentile | -6.503702 |
| Q1 | -4.9824809 |
| median | -3.9177214 |
| Q3 | -2.8833259 |
| 95-th percentile | -1.5984046 |
| Maximum | 1.0855562 |
| Range | 9.6573476 |
| Interquartile range (IQR) | 2.099155 |
Descriptive statistics
| Standard deviation | 1.512669 |
|---|---|
| Coefficient of variation (CV) | -0.38377856 |
| Kurtosis | -0.083784143 |
| Mean | -3.9415151 |
| Median Absolute Deviation (MAD) | 1.0592358 |
| Skewness | -0.022693233 |
| Sum | -3909.983 |
| Variance | 2.2881674 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| -5.493698087 | 1 | 0.1% |
| -2.481194209 | 1 | 0.1% |
| -2.667889449 | 1 | 0.1% |
| -6.909910232 | 1 | 0.1% |
| -2.465197367 | 1 | 0.1% |
| -3.99181341 | 1 | 0.1% |
| -3.145331546 | 1 | 0.1% |
| -6.479883345 | 1 | 0.1% |
| -4.99998157 | 1 | 0.1% |
| -4.672351284 | 1 | 0.1% |
| Other values (982) | 982 | |
| (Missing) | 8 | 0.8% |
| Value | Count | Frequency (%) |
| -8.571791335 | 1 | |
| -8.042994054 | 1 | |
| -7.943988162 | 1 | |
| -7.906057256 | 1 | |
| -7.824014162 | 1 | |
| -7.693862525 | 1 | |
| -7.566110388 | 1 | |
| -7.503920998 | 1 | |
| -7.470603661 | 1 | |
| -7.376567763 | 1 |
| Value | Count | Frequency (%) |
| 1.085556232 | 1 | |
| 0.935776165 | 1 | |
| 0.7760667111 | 1 | |
| 0.224126414 | 1 | |
| 0.1960867204 | 1 | |
| -0.1340978557 | 1 | |
| -0.281881298 | 1 | |
| -0.3280029895 | 1 | |
| -0.3632662883 | 1 | |
| -0.3756889403 | 1 |
feat.c
Categorical
| Distinct | 4 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.9 KiB |
| b | |
|---|---|
| d | |
| a | |
| c |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 1000 |
|---|---|
| Distinct characters | 4 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | b |
|---|---|
| 2nd row | d |
| 3rd row | b |
| 4th row | a |
| 5th row | c |
Common Values
| Value | Count | Frequency (%) |
| b | 278 | |
| d | 255 | |
| a | 240 | |
| c | 227 |
Length
Histogram of lengths of the category
Common Values (Plot)
| Value | Count | Frequency (%) |
| b | 278 | |
| d | 255 | |
| a | 240 | |
| c | 227 |
Most occurring characters
| Value | Count | Frequency (%) |
| b | 278 | |
| d | 255 | |
| a | 240 | |
| c | 227 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 1000 |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| b | 278 | |
| d | 255 | |
| a | 240 | |
| c | 227 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 1000 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| b | 278 | |
| d | 255 | |
| a | 240 | |
| c | 227 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1000 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| b | 278 | |
| d | 255 | |
| a | 240 | |
| c | 227 |
feat.d
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 9 |
| Missing (%) | 0.9% |
| Memory size | 7.9 KiB |
| 1.0 | |
|---|---|
| 0.0 |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Characters and Unicode
| Total characters | 2973 |
|---|---|
| Distinct characters | 3 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.0 |
|---|---|
| 2nd row | 1.0 |
| 3rd row | 1.0 |
| 4th row | 1.0 |
| 5th row | 1.0 |
Common Values
| Value | Count | Frequency (%) |
| 1.0 | 511 | |
| 0.0 | 480 | |
| (Missing) | 9 | 0.9% |
Length
Histogram of lengths of the category
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1.0 | 511 | |
| 0.0 | 480 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 1471 | |
| . | 991 | |
| 1 | 511 | 17.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 1982 | |
| Other Punctuation | 991 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 1471 | |
| 1 | 511 | 25.8% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 991 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 2973 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 1471 | |
| . | 991 | |
| 1 | 511 | 17.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2973 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 1471 | |
| . | 991 | |
| 1 | 511 | 17.2% |
| Distinct | 1000 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -0.51832075 |
| Minimum | -6.7581764 |
|---|---|
| Maximum | 5.2897088 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 596 |
| Negative (%) | 59.6% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | -6.7581764 |
|---|---|
| 5-th percentile | -3.8409138 |
| Q1 | -1.7792963 |
| median | -0.51638151 |
| Q3 | 0.80107757 |
| 95-th percentile | 2.7744859 |
| Maximum | 5.2897088 |
| Range | 12.047885 |
| Interquartile range (IQR) | 2.5803738 |
Descriptive statistics
| Standard deviation | 1.9847034 |
|---|---|
| Coefficient of variation (CV) | -3.8291026 |
| Kurtosis | -0.034369636 |
| Mean | -0.51832075 |
| Median Absolute Deviation (MAD) | 1.2858637 |
| Skewness | -0.071509945 |
| Sum | -518.32075 |
| Variance | 3.9390475 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| -0.8006149557 | 1 | 0.1% |
| -0.4105429274 | 1 | 0.1% |
| -2.149190977 | 1 | 0.1% |
| 0.6691611674 | 1 | 0.1% |
| -2.496597332 | 1 | 0.1% |
| -3.468563012 | 1 | 0.1% |
| 0.01555496617 | 1 | 0.1% |
| 0.3305800081 | 1 | 0.1% |
| 1.550839146 | 1 | 0.1% |
| 0.9521521358 | 1 | 0.1% |
| Other values (990) | 990 |
| Value | Count | Frequency (%) |
| -6.758176383 | 1 | |
| -6.399743569 | 1 | |
| -6.335952428 | 1 | |
| -6.178752334 | 1 | |
| -5.856328822 | 1 | |
| -5.819338759 | 1 | |
| -5.719050018 | 1 | |
| -5.473047809 | 1 | |
| -5.271585502 | 1 | |
| -5.166574755 | 1 |
| Value | Count | Frequency (%) |
| 5.289708782 | 1 | |
| 4.807481457 | 1 | |
| 4.698983411 | 1 | |
| 4.696980464 | 1 | |
| 4.628818601 | 1 | |
| 4.580737248 | 1 | |
| 4.466210225 | 1 | |
| 4.245676957 | 1 | |
| 3.971205537 | 1 | |
| 3.96399422 | 1 |
| Distinct | 1000 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -6.2573455 |
| Minimum | -31.099076 |
|---|---|
| Maximum | 21.567936 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 789 |
| Negative (%) | 78.9% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | -31.099076 |
|---|---|
| 5-th percentile | -19.487967 |
| Q1 | -11.654848 |
| median | -6.2624289 |
| Q3 | -0.91253298 |
| 95-th percentile | 6.3077715 |
| Maximum | 21.567936 |
| Range | 52.667012 |
| Interquartile range (IQR) | 10.742315 |
Descriptive statistics
| Standard deviation | 8.0055295 |
|---|---|
| Coefficient of variation (CV) | -1.2793811 |
| Kurtosis | 0.17362413 |
| Mean | -6.2573455 |
| Median Absolute Deviation (MAD) | 5.3710946 |
| Skewness | 0.027865522 |
| Sum | -6257.3455 |
| Variance | 64.088503 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| -4.427601788 | 1 | 0.1% |
| 5.733868313 | 1 | 0.1% |
| -16.23534137 | 1 | 0.1% |
| -4.527911115 | 1 | 0.1% |
| -11.99220613 | 1 | 0.1% |
| -10.86518825 | 1 | 0.1% |
| -2.641091002 | 1 | 0.1% |
| 0.7347983651 | 1 | 0.1% |
| -2.958744506 | 1 | 0.1% |
| -10.27875464 | 1 | 0.1% |
| Other values (990) | 990 |
| Value | Count | Frequency (%) |
| -31.09907622 | 1 | |
| -30.34507874 | 1 | |
| -29.10103863 | 1 | |
| -28.74414316 | 1 | |
| -28.02887003 | 1 | |
| -25.93349585 | 1 | |
| -25.90821945 | 1 | |
| -25.61192808 | 1 | |
| -25.58897083 | 1 | |
| -25.03581109 | 1 |
| Value | Count | Frequency (%) |
| 21.56793583 | 1 | |
| 20.17426201 | 1 | |
| 19.88434253 | 1 | |
| 17.93220262 | 1 | |
| 17.77268027 | 1 | |
| 17.3905916 | 1 | |
| 14.17918456 | 1 | |
| 14.01412077 | 1 | |
| 13.32045114 | 1 | |
| 13.04330909 | 1 |
feat.g
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.9 KiB |
| z | |
|---|---|
| x | |
| y |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 1000 |
|---|---|
| Distinct characters | 3 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | z |
|---|---|
| 2nd row | x |
| 3rd row | y |
| 4th row | y |
| 5th row | z |
Common Values
| Value | Count | Frequency (%) |
| z | 341 | |
| x | 330 | |
| y | 329 |
Length
Histogram of lengths of the category
Common Values (Plot)
| Value | Count | Frequency (%) |
| z | 341 | |
| x | 330 | |
| y | 329 |
Most occurring characters
| Value | Count | Frequency (%) |
| z | 341 | |
| x | 330 | |
| y | 329 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 1000 |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| z | 341 | |
| x | 330 | |
| y | 329 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 1000 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| z | 341 | |
| x | 330 | |
| y | 329 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1000 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| z | 341 | |
| x | 330 | |
| y | 329 |
feat.h
Real number (ℝ)
| Distinct | 1000 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10.030833 |
| Minimum | 3.4212483 |
|---|---|
| Maximum | 17.431441 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 3.4212483 |
|---|---|
| 5-th percentile | 6.6924294 |
| Q1 | 8.7001442 |
| median | 10.028309 |
| Q3 | 11.528759 |
| 95-th percentile | 13.25505 |
| Maximum | 17.431441 |
| Range | 14.010193 |
| Interquartile range (IQR) | 2.8286147 |
Descriptive statistics
| Standard deviation | 2.0222002 |
|---|---|
| Coefficient of variation (CV) | 0.20159843 |
| Kurtosis | 0.039840298 |
| Mean | 10.030833 |
| Median Absolute Deviation (MAD) | 1.4211137 |
| Skewness | -0.13026894 |
| Sum | 10030.833 |
| Variance | 4.0892935 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 10.25419887 | 1 | 0.1% |
| 6.754303427 | 1 | 0.1% |
| 11.24278619 | 1 | 0.1% |
| 12.68298848 | 1 | 0.1% |
| 9.352376556 | 1 | 0.1% |
| 10.28876694 | 1 | 0.1% |
| 7.836294432 | 1 | 0.1% |
| 9.988045316 | 1 | 0.1% |
| 9.277988632 | 1 | 0.1% |
| 8.493310924 | 1 | 0.1% |
| Other values (990) | 990 |
| Value | Count | Frequency (%) |
| 3.421248303 | 1 | |
| 3.475701331 | 1 | |
| 3.945908562 | 1 | |
| 4.151417736 | 1 | |
| 4.22770175 | 1 | |
| 4.51677224 | 1 | |
| 4.614397502 | 1 | |
| 4.86227061 | 1 | |
| 5.022192772 | 1 | |
| 5.217552022 | 1 |
| Value | Count | Frequency (%) |
| 17.43144145 | 1 | |
| 16.16747908 | 1 | |
| 15.85780148 | 1 | |
| 15.69748807 | 1 | |
| 14.83414122 | 1 | |
| 14.79040265 | 1 | |
| 14.62396292 | 1 | |
| 14.62363889 | 1 | |
| 14.48832726 | 1 | |
| 14.47632645 | 1 |
| Distinct | 1000 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -0.5186067 |
| Minimum | -6.7634268 |
|---|---|
| Maximum | 5.3157286 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 598 |
| Negative (%) | 59.8% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | -6.7634268 |
|---|---|
| 5-th percentile | -3.8868159 |
| Q1 | -1.773089 |
| median | -0.50608495 |
| Q3 | 0.80304068 |
| 95-th percentile | 2.7870572 |
| Maximum | 5.3157286 |
| Range | 12.079155 |
| Interquartile range (IQR) | 2.5761297 |
Descriptive statistics
| Standard deviation | 1.9843781 |
|---|---|
| Coefficient of variation (CV) | -3.8263643 |
| Kurtosis | -0.029327021 |
| Mean | -0.5186067 |
| Median Absolute Deviation (MAD) | 1.2926208 |
| Skewness | -0.071304314 |
| Sum | -518.6067 |
| Variance | 3.9377566 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| -0.8280728697 | 1 | 0.1% |
| -0.3358788662 | 1 | 0.1% |
| -2.138754969 | 1 | 0.1% |
| 0.7567881697 | 1 | 0.1% |
| -2.505539361 | 1 | 0.1% |
| -3.490892139 | 1 | 0.1% |
| 0.07054932741 | 1 | 0.1% |
| 0.3639592021 | 1 | 0.1% |
| 1.607569457 | 1 | 0.1% |
| 0.9758482567 | 1 | 0.1% |
| Other values (990) | 990 |
| Value | Count | Frequency (%) |
| -6.763426764 | 1 | |
| -6.398746196 | 1 | |
| -6.376277409 | 1 | |
| -6.192309923 | 1 | |
| -5.845633414 | 1 | |
| -5.82995219 | 1 | |
| -5.756198292 | 1 | |
| -5.522454828 | 1 | |
| -5.215154372 | 1 | |
| -5.174046974 | 1 |
| Value | Count | Frequency (%) |
| 5.315728559 | 1 | |
| 4.842965736 | 1 | |
| 4.715903484 | 1 | |
| 4.646257329 | 1 | |
| 4.588374531 | 1 | |
| 4.550044682 | 1 | |
| 4.446508874 | 1 | |
| 4.248004011 | 1 | |
| 3.966186668 | 1 | |
| 3.947004253 | 1 |
Auto
The auto setting is an interpretable pairwise column metric of the following mapping:- Variable_type-Variable_type : Method, Range
- Categorical-Categorical : Cramer's V, [0,1]
- Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
- Numerical-Numerical : Spearman's ρ, [-1,1]
This configuration uses the recommended metric for each pair of columns.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
| response | feat.a | feat.b | feat.c | feat.d | feat.e | feat.f | feat.g | feat.h | feat.i | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | -0.681427 | -5.493698 | b | 0.0 | -0.800615 | -4.427602 | z | 10.254199 | -0.828073 |
| 1 | 1 | 0.309468 | -5.559933 | d | 1.0 | -1.155514 | -0.799094 | x | 9.084749 | -1.109698 |
| 2 | 1 | 5.676125 | -4.026970 | b | 1.0 | -3.396331 | -0.631966 | y | 8.753848 | -3.417417 |
| 3 | 1 | 1.211525 | -4.198263 | a | 1.0 | -1.894569 | -16.273262 | y | 12.191295 | -1.904801 |
| 4 | 1 | 1.387863 | -7.824014 | c | 1.0 | 4.696980 | -22.208877 | z | 9.626686 | 4.715903 |
| 5 | 1 | 6.145195 | -2.439140 | c | 0.0 | -0.574830 | 11.642609 | y | 12.362962 | -0.521423 |
| 6 | 1 | 2.382749 | -3.625411 | a | 0.0 | 1.326984 | -4.148881 | z | 9.226122 | 1.287618 |
| 7 | 1 | -2.795184 | -0.375689 | c | 1.0 | -0.869053 | -2.994862 | x | 7.973038 | -0.839326 |
| 8 | 0 | -1.060559 | -2.972203 | b | 0.0 | 0.719649 | -15.543748 | z | 12.893124 | 0.718503 |
| 9 | 1 | -0.336986 | -4.670439 | b | 1.0 | -0.605454 | 3.060399 | y | 9.803020 | -0.548610 |
| response | feat.a | feat.b | feat.c | feat.d | feat.e | feat.f | feat.g | feat.h | feat.i | |
|---|---|---|---|---|---|---|---|---|---|---|
| 990 | 1 | 3.027287 | -5.645709 | b | 1.0 | -4.847993 | -8.070246 | x | 12.043355 | -4.894286 |
| 991 | 1 | -2.222620 | -2.611733 | a | 0.0 | 0.735233 | -2.876741 | x | 10.090726 | 0.816545 |
| 992 | 0 | 2.363733 | -3.629801 | b | 0.0 | -5.109591 | -7.578162 | y | 9.301541 | -5.119936 |
| 993 | 0 | 0.360079 | -5.105157 | d | 0.0 | -1.393937 | -21.575596 | y | 10.537327 | -1.397865 |
| 994 | 0 | 1.939686 | -5.920013 | b | 0.0 | 0.098981 | -17.421105 | z | 11.705579 | 0.084855 |
| 995 | 0 | 0.730074 | -3.885035 | b | 0.0 | -3.356949 | -12.803344 | z | 11.204110 | -3.396673 |
| 996 | 1 | 4.211548 | -3.617253 | a | 0.0 | 2.034995 | 6.995753 | z | 9.208089 | 2.069752 |
| 997 | 1 | -3.053301 | -3.583830 | c | 1.0 | 1.929012 | -7.013105 | z | 7.637862 | 1.856356 |
| 998 | 1 | -0.567850 | -3.194716 | c | 1.0 | -1.849712 | 4.204816 | z | 11.725868 | -1.862466 |
| 999 | 1 | 0.252428 | -4.690728 | d | 1.0 | 1.742044 | -4.564031 | y | 7.909709 | 1.747037 |